Future und Async/Await

Max Maischein

Frankfurt.pm

German version

https://corion.net/talks/

German questions are welcome

Überblick

  • Will the talk help me?

  • Transition from sequential code to asynchronous with Futures

  • Javascript / E / .Net

Who am I

  • Max Maischein

  • DZ BANK Frankfurt

  • Deutsche Zentralgenossenschaftsbank

  • Data Scientist

Motivation

  • Internal search engine

  • Web crawler / Intranet crawler -> Windows

  • Why linear? 'Cause it's easy.

Crawler?

Structure

Structure

Structure

Structure

Structure

Parallelization

Fork and threads

  • preemptive multitasking

  • It's not the matter of how well the bear dances

  • ... but that it dances at all

State machine

  • Only understandable with the picture

  • Not pictured: HTTP state machine

  • nested state machines are hard

Async / select() / NIO mit Callbacks

  • AnyEvent

  • Callback Hell

  • Javascript

Async/Await

async / await

  • A pair of keywords to write asynchronous code in a "nicer" way

  • Future::AsyncAwait

  • Perl 5.16+

Dramatis Personae

Dramatis Personae

Queue - a Queue

 1:  our @queue;

Dramatis Personae

UA - User Agent ("Requester", "Browser"):

LWP::UserAgent

 1:    $ua->get($url);

Dramatis Personae

UA - User Agent ("Requester", "Browser"):

LWP::UserAgent

 1:    $ua->get($url);

Future

 1:    my $ua = Future::HTTP->new();
 2:    $ua->http_get($url);

Dramatis Personae

Extractor

  • z.B. HTML::Selector::XPath

  • ... or regular expressions

Linearer Crawler

Linearer Crawler

Linearer Crawler

Linearer Crawler

Linearer Crawler

Linearer Crawler

 1:    our %seen;
 2:    our @queue = @ARGV;
 3:    ...
 4:    do {
 5:        my $url = shift @queue;
 6:        #sleep 10;
 7:        fetch_and_extract( $url );
 8:    } while (@queue);
 9:    print for @results;

Future as API

Ideally same API just with ->get() tacked on:

 1:  my $response = $ua->get($url);

becomes

 1:  my $response_future = $ua->get( $url ); # start
 2:  # do something else
 3:  my $response = $response_future->get(); # Use result

Future als API

Idealerweise wie die synchrone API, nur mit "->get()" dahinter:

 1:  my $response = $ua->get($url);

becomes

 1:  my $response_future = $ua->get( $url )->then(sub {
 2:    my ($response) = @_;
 3:    # Use result
 4:  }); # starten

Modul: Future

Future and Frameworks

->get() calls the backend framework

Still a state machine

... but hidden

"Mechanical" translation

  • "await" in front of Future

  • "await" hides the ->get() call

"Mechanical" translation

  • "await" in front of Future

  • "await" hides the ->get() call

 1:  my $response_future =       $ua->get( $url ); # start
 2:  my $response = $response_future->get(); # Use result

"Mechanical" translation

  • "await" in front of Future

  • "await" hides the ->get() call

 1:  my $response        = await $ua->get( $url );

Vorteil von async/await:

 1:  my $f = $ua->get();
 2:  $f->then( sub {
 3:      my( $response ) = @_;
 4:      print "Have result, let's go\n";
 5:  });

wird zu

 1:  my $response = await $ua->get();
 2:  print "Have result, let's go\n";

Vorteil von async/await:

 1:  my $f = AnyEvent::Future->timer( after => 5 );
 2:  $f->then( sub {
 3:      my( $response ) = @_;
 4:      print "Woke up, let's go\n";
 5:  });

wird zu

 1:  my $f = await AnyEvent::Future->timer( after => 5 );
 2:  print "Woke up, let's go\n";

Linearer Crawler

await/async Crawler

AsyncAwait-Crawler

 1:    our %seen;
 2:    our @queue = @ARGV;
 3:    ...
 4:    my @done;
 5:    repeat {
 6:        my $url = shift @queue;
 7:        # await sleep(10);
 8:        $pending{ $url } = 1;
 9:        fetch_and_extract( $url )->then(sub {
10:            delete $pending{ $url };
11:        });
12:
13:        my $next_action;
14:        if( @queue ) {
15:            $next_action = Future->done(1)
16:        } elsif( scalar keys %pending ) {
17:            $next_action = Future->done(1)
18:        } else {
19:            $next_action = $done;
20:            $done->done(@results)
21:        }
22:        $next_action
23:    }, while => sub { $_[0]->get });
24:    @results = $done->get;

Danke

Danke

Fragen?

Danke

Fragen?

Slides sind online:

https://corion.net/talks/

Bonus Section

Vorteile / Nachteile fork vs. async

 1:                 fork                async
 2:  CPU            Beliebige CPUs      1 CPU
 3:  Crash          egal                fatal
 4:  Race
 5:    conditions   Ja                  eher nein
 6:  Framework      Parallel::FM        Future / AnyEvent / Mojolicious / ...
 7:  Globale
 8:    Variablen    Nein                Ja
 9:  Implementation Trivial             Rewrite

Warum nicht mehrere Prozesse (ohne fork())

  • avoid IPC

  • Use Dancer and WWW::Mechanize::Chrome in the same process

Warum nicht Coro

  • async macht deutlich, wohin abgegeben werden kann

  • Nur an den direkten Aufrufer

  • Coro macht viel mehr Gymnastik

  • Kann dafür auch viel mehr

  • Möchte/brauche ich nicht

Architektur eines besseren Crawlers

  • Queue sollte persistent sein

  • Anfrage und Antwort speichern (WARC)

     1:      Queue -> In flight -> Extractor -> WARC -> Results
     2:       ^+---- retry ---+      |
     3:       ^+---- new links ------+
  • Wenn Du alle Fragen kennst, die in Zukunft gestellt werden, warum bist Du dann Programmierer?

Sleep Problem

Wir möchten die Zahl der Requests wieder reduzieren

5 Sekunden bevor der nächste Request gesendet wird

 1:  sleep 5;

Linearer Crawler + Parallel::ForkManager (Queue muss DB werden)

Aufrufe zu Futures umbauen wenn API nicht schon kann